Skip to content

Updated received trickled candidates handling#877

Open
sirzooro wants to merge 1 commit intopion:masterfrom
sirzooro:df_trickle_fixes
Open

Updated received trickled candidates handling#877
sirzooro wants to merge 1 commit intopion:masterfrom
sirzooro:df_trickle_fixes

Conversation

@sirzooro
Copy link
Contributor

@sirzooro sirzooro commented Jan 20, 2026

Implements RFC 8838 §11.4 (“Receiving Trickled Candidates”) behavior so that when a remote peer-reflexive (prflx) candidate is discovered first, a later signaled equivalent candidate (host/srflx/relay) replaces it in-place while preserving checklist/pair behavior.

Key Changes

  • prflx replacement on trickle (RFC 8838 §11.4)

    • Replace redundant remote prflx candidates when a signaled candidate with the same transport address arrives.
    • Upgrade existing checklist pairs in-place (remote candidate swap), while preserving the pair priority.
    • Update local candidate remote-candidate caches so subsequent lookups use the signaled candidate.
    • Avoid creating duplicate pairs when the replacement occurs.
  • prflx priority from inbound STUN (RFC 5245 §7.1.3.2.1)

    • When a prflx candidate is discovered via an inbound Binding Request, set its priority from the STUN PRIORITY attribute.

- Candidate pair priority correctness (RFC 5245)
- Fix pair-priority computation to use the RFC 5245 formula with $2^{32}$ (not $2^{32}-1$).
- Add an internal helper for the computation and make it overflow-safe via saturation.
- Add a pair-level priority override so prflx→signaled replacement can preserve the previously computed pair priority.
(Restored original code, this change was not needed)

  • Handler notifier deadlock/starvation fix
    • Split the notifier’s single running flag into independent running state for connection-state, candidate, and selected-pair queues.
    • Prevent callback starvation that could cause tests (and user code) waiting on connection-state transitions to hang.

- Test stability improvements (Active TCP)
- Prefer non-link-local IPv6 addresses and skip IPv6 cases when only link-local addresses exist.
- Use mDNS on non-Windows platforms but disable it on Windows to reduce flakiness.
- Increase the per-subtest timeout to reduce intermittent CI/dev timeouts.
(Moved to another PR)

Reference issue

Fixes #622

@codecov
Copy link

codecov bot commented Jan 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.80%. Comparing base (7aa0886) to head (e10e15f).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #877      +/-   ##
==========================================
+ Coverage   88.53%   88.80%   +0.27%     
==========================================
  Files          44       44              
  Lines        5540     5576      +36     
==========================================
+ Hits         4905     4952      +47     
+ Misses        440      431       -9     
+ Partials      195      193       -2     
Flag Coverage Δ
go 88.80% <100.00%> (+0.27%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements RFC 8838 §11.4 behavior for receiving trickled candidates, ensuring that when a remote peer-reflexive (prflx) candidate is discovered first via STUN, a later signaled equivalent candidate (host/srflx/relay) properly replaces it in-place while preserving ICE checklist behavior and pair priorities.

Changes:

  • Implements RFC 8838 §11.4 prflx replacement logic when signaled candidates arrive with the same transport address
  • Fixes RFC 5245 pair priority calculation to use 2^32 instead of 2^32-1, with overflow protection
  • Adds priority override mechanism to preserve pair priority during prflx→signaled candidate replacement
  • Fixes handler notifier deadlock by splitting the single running flag into independent flags for each queue type
  • Improves Active TCP test stability through better IPv6 address handling and platform-specific mDNS configuration

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
candidatepair.go Adds priority override mechanism and fixes RFC 5245 pair priority calculation to use 2^32 with overflow protection
candidatepair_test.go Adds helper functions for creating candidates with custom priorities and updates tests to verify correct priority calculations
agent.go Implements prflx replacement logic, adds sameTransportAddress comparison, replaceRemoteCandidateCacheValues helper, duplicate pair prevention, and STUN PRIORITY attribute extraction
agent_test.go Adds comprehensive tests for prflx priority from STUN and prflx replacement by host/srflx/relay candidates
agent_handlers.go Fixes deadlock by splitting single running flag into separate flags for connection states, candidates, and candidate pairs
agent_handlers_test.go Updates tests to verify the three separate running flags
active_tcp_test.go Improves test stability by preferring non-link-local IPv6 addresses on all platforms, conditionally using mDNS based on platform, and increasing timeout
.gitignore Adds .vscode/ to ignore list for editor-specific configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@JoTurk JoTurk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i missed something but i don't see where we implement the re-sort part in the spec?

If the agent finds a redundancy between two pairs and one of those pairs contains a newly received remote candidate whose type is peer-reflexive, the agent SHOULD discard the pair containing that candidate, set the priority of the existing pair to the priority of the discarded pair, and re-sort the checklist. (This policy helps to eliminate problems with remote peer-reflexive candidates for which a STUN Binding request is received before signaling of the candidate is trickled to the receiving agent, such as a different view of pair priorities between the local agent and the remote agent, because the same candidate could be perceived as peer-reflexive by one agent and as server-reflexive by the other agent.)

Comment on lines 35 to 65
@@ -53,17 +55,13 @@ func ipv6Available(t *testing.T) bool {
_, localAddrs, err := localInterfaces(net, problematicNetworkInterfaces, nil, []NetworkType{NetworkTypeTCP6}, false)
require.NoError(t, err)

if runtime.GOOS == "darwin" {
for _, addr := range localAddrs {
if !addr.addr.IsLinkLocalUnicast() {
return true
}
for _, addr := range localAddrs {
if !addr.addr.IsLinkLocalUnicast() {
return true
}

return false
}

return len(localAddrs) > 0
return false
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for mac only because I wasn't sure if it affects all the platforms we support. anyways, this should be done in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will move this to another PR.

Comment on lines 94 to 98
if ipv6Available(t) {
// mDNS on Windows is flaky in CI/dev environments; prefer it elsewhere.
// if we don't use mDNS, we will very likely be filtering out location tracked ips.
useMDNS := runtime.GOOS != "windows"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the test adjustments should be done in separate PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will move this to another PR.

agent.go Outdated
Comment on lines 957 to 970
func sameTransportAddress(a, b Candidate) bool {
if a == nil || b == nil {
return false
}

addrA := removeZoneIDFromAddress(a.Address())
addrB := removeZoneIDFromAddress(b.Address())

return a.NetworkType() == b.NetworkType() &&
a.Component() == b.Component() &&
addrA == addrB &&
a.Port() == b.Port() &&
a.TCPType() == b.TCPType()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need another equality helper when we have candidate-base's Equal which also compares RelatedAddress.
If it's about fixing the equality for ipv6 addresses with zones, I think we should fix the original helper, I didn't check if works for Ipv6 addresses with zones, but we did fix it for turn recently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equal also compares Type and RelatedAddress, so it is not suitable here. My version also checks if components are equal (maybe Equal should do this too). I added part to remove IPv6 zones after I got comment to ignore them in one of my recent PRs here.

agent.go Outdated
Comment on lines 1058 to 1071
// RFC 8838 §11.4: If a trickled candidate is redundant with an existing
// peer-reflexive candidate (same transport address), prefer the signaled
// candidate and replace the peer-reflexive one.
var replacedPrflx []Candidate
if cand.Type() != CandidateTypePeerReflexive {
for i := 0; i < len(set); i++ {
existing := set[i]
if existing.Type() == CandidateTypePeerReflexive && sameTransportAddress(existing, cand) {
replacedPrflx = append(replacedPrflx, existing)
set = append(set[:i], set[i+1:]...)
i--
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we're in the process in cleaning ICE #854 and it would be cool to clean new code, and not add new nolints for cyclop or gocognit.

I think this should be refactored to a another function but it's not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, moved code to replaceRedundantPeerReflexiveCandidates

@sirzooro
Copy link
Contributor Author

maybe i missed something but i don't see where we implement the re-sort part in the spec?

Explicit sorting is not implemented, even that RFC 5245 section 2.2 says this:

The basic principle of the connectivity checks is simple:

  1. Sort the candidate pairs in priority order.
  2. Send checks on each candidate pair in priority order.
  3. Acknowledge checks received from the other agent.

This is implicitly checked in getBestValidCandidatePair() / getBestAvailableCandidatePair(). Lack of sorting only affects order in which candidates are contacted in pingAllCandidates() and keepalive loops (now it is insertion order).

I will add new test that verifies that host-prflx candidate is not selected (prflxAcceptanceMinWait does not pass), and is accepted after conversion to host-host one (hostAcceptanceMinWait is set to 0). I can add this sorting too if you think it is important.

@sirzooro sirzooro force-pushed the df_trickle_fixes branch 3 times, most recently from 83347bc to c73a572 Compare January 25, 2026 17:55
@sirzooro
Copy link
Contributor Author

Done, @JoTurk please take a look again.

@sirzooro sirzooro force-pushed the df_trickle_fixes branch 6 times, most recently from 71526d2 to e292f9f Compare January 31, 2026 17:11
Implements RFC 8838 section 11.4 (Receiving Trickled Candidates)
behavior so that when a remote peer-reflexive (prflx) candidate is
discovered first, a later signaled equivalent candidate
(host/srflx/relay) replaces it in-place while preserving checklist/pair
behavior.
Set priority of prflx candidate to value received in STUN PRIORITY
attribute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Candidate type is not changed from peer-reflexive to host

2 participants